Table of Contents

Credit Card Users Churn Prediction

Context

Thera bank recently saw a steep decline in the number of users of their credit card. Customers’ leaving credit cards services would lead bank to loss, so the bank wants to analyze the data of customers and identify the customers who will leave their credit card services and reason for same – so that bank could improve upon those areas

Objective

Key questions to be answered

Data Description

  1. CLIENTNUM: Client number. Unique identifier for the customer holding the account
  2. Attrition_Flag: Internal event (customer activity) variable - if the account is closed then "Attrited Customer" else "Existing Customer"
  3. Customer_Age: Age in Years
  4. Gender: Gender of the account holder
  5. Dependent_count: Number of dependents
  6. Education_Level: Educational Qualification of the account holder - Graduate, High School, Unknown, Uneducated, College(refers to a college student), Post-Graduate, Doctorate.
  7. Marital_Status: Marital Status of the account holder
  8. Income_Category: Annual Income Category of the account holder
  9. Card_Category: Type of Card
  10. Months_on_book: Period of relationship with the bank
  11. Total_Relationship_Count: Total no. of products held by the customer
  12. Months_Inactive_12_mon: No. of months inactive in the last 12 months
  13. Contacts_Count_12_mon: No. of Contacts between the customer and bank in the last 12 months
  14. Credit_Limit: Credit Limit on the Credit Card
  15. Total_Revolving_Bal: The balance that carries over from one month to the next is the revolving balance
  16. Avg_Open_To_Buy: Open to Buy refers to the amount left on the credit card to use (Average of last 12 months)
  17. Total_Trans_Amt: Total Transaction Amount (Last 12 months)
  18. Total_Trans_Ct: Total Transaction Count (Last 12 months)
  19. Total_Ct_Chng_Q4_Q1: Ratio of the total transaction count in 4th quarter and the total transaction count in 1st quarter
  20. Total_Amt_Chng_Q4_Q1: Ratio of the total transaction amount in 4th quarter and the total transaction amount in 1st quarter
  21. Avg_Utilization_Ratio: Represents how much of the available credit the customer spent

Load and explore the dataset

Import the necessary packages

Let's start by importing libraries we need.

Read the dataset

View the first and last 5 rows and sample of the dataset.

Observations:

Understand the shape of the dataset.

Check the columns in the data

Check the data types of the columns

Observations:

Fixing the data types

Check for missing values

Observations:

Check for duplicate values

Summary of the dataset

Observations:

Observations:

Number of unique values in each column

Let's look at the unqiue values of all the categories

Observations:

Data Pre-Processing

EDA

Univariate analysis

Observations on Customer_Age

Observations on Dependent_count

Observations on Months_on_book

Observations on Total_Relationship_Count

Observations on Months_Inactive_12_mon

Observations on Contacts_Count_12_mon

Observations on Credit_Limit

Observations on Total_Revolving_Bal

Observations on Avg_Open_To_Buy

Observations on Total_Amt_Chng_Q4_Q1

Observations on Total_Trans_Amt

Observations on Total_Trans_Ct

Observations on Total_Ct_Chng_Q4_Q1

Observations on Avg_Utilization_Ratio

Observations on Attrition_Flag

Observations on Gender

Observations on Education_Level

Observations on Marital_Status

Observations on Income_Category

Observations on Card_Category

Bivariate Analysis

Correlation between numerical variables

Let's check the variation in Attrition_Flag with some of the other variables.

Target vs Dependent_count

Target vs Months_on_book, Customer_Age

Target vs Total_Relationship_Count

Target vs Months_Inactive_12_mon, Contacts_Count_12_mon

Target vs Credit_Limit, Avg_Open_To_Buy

Target vs Total_Revolving_Bal, Avg_Utilization_Ratio

Target vs Total_Amt_Chng_Q4_Q1

Target vs Total_Trans_Amt, Total_Trans_Ct

Target vs Total_Ct_Chng_Q4_Q1

Target vs Gender

Target vs Education_Level

Target vs Marital_Status

Target vs Income_Category

Target vs Card_Category

Data Pre-Processing Contd.

Missing Value Treatment

The values obtained might not be integer always which is not be the best way to impute categorical values

Data Preparation for Modeling

Imputing Missing Values

Creating Dummy Variables

Building the model

Model evaluation criterion

Model can make wrong predictions as

  1. Predicting a customer will leave the credit card services and the customer doesn't leave - Loss of resources
  2. Predicting a customer will not leave the credit card services and the customer attrites - Loss of opportunity

Which case is more important?

How to reduce this loss i.e need to reduce False Negatives?

Model building using KFold and cross_val_score

Let's evaluate the model performance by using KFold and cross_val_score

Model building with Oversampled train data (using SMOTE)

Model building with Undersampled train data (using Random Under Sampler)

Hyperparameter tuning using RandomizedSearchCV

XGBoost - Regular Train Data

XGBoost - Oversampled Train Data

XGBoost - Undersampled Train Data

Adaboost - Undersampled Train Data

Gradient Boosting Classifier - Undersampled Train Data

Comparing all models

Performance on the test set

Feature Importances

Pipelines for productionizing the model

Now, we have a final model. let's use pipelines to put the model into production

Column Transformer

Business Insights and Recommendations